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Due  to  recent  advances  in  hyperspectral  imaging  sensors 
many  subtle  unknown  signal  sources  that  cannot  be  resolved  by 
multispectral  sensors  can  be  now  uncovered  for  target  detection, 
discrimination,  and  identification.  Because  the  information 
about  such  sources  is  generally  not  available,  automatic  target 
recognition  (ATR)  presents  a  great  challenge  to  hyperspectral 
image  analysts.  Many  approaches  developed  for  ATR  are 
based  on  second-order  statistics  in  the  past  years.  This  paper 
investigates  ATR  techniques  using  high  order  statistics.  For 
ATR  in  hyperspectral  imagery,  most  interesting  targets  usually 
occur  with  low  probabilities  and  small  population  and  they 
generally  cannot  be  described  by  second-order  statistics.  Under 
such  circumstances,  using  high-order  statistics  to  perform  target 
detection  have  been  shown  by  experiments  in  this  paper  to  be 
more  effective  than  using  second  order  statistics.  In  order  to 
further  address  a  challenging  issue  in  determining  the  number 
of  signal  sources  needed  to  be  detected,  a  recently  developed 
concept  of  virtual  dimensionality  (VD)  is  used  to  estimate  this 
number.  The  experiments  demonstrate  that  using  high-order 
statistics-based  techniques  in  conjunction  with  the  VD  to  perform 
ATR  are  indeed  very  effective. 
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Recently,  a  new  generation  of  remote  sensing 
instruments  with  very  high  spectral  resolution, 
called  imaging  spectrometers  or  hyperspectral 
imaging  sensors  have  been  developed  to  uncover 
subtle  material  substances  that  generally  cannot 
be  resolved  by  multispectral  sensors.  Therefore, 
hyperspectral  sensors  provide  a  new  dimension  in 
applications  that  cannot  be  addressed  by  classical 
spatial  domain-based  techniques.  One  of  such 
applications  is  automatic  target  recognition  (ATR)  in 
finding  targets  that  are  relatively  small  and  generally 
occur  in  low  probabilities  with  no  prior  knowledge. 
More  importantly,  when  these  targets  appear,  their 
population  is  usually  not  too  large.  Of  particular 
interest  is  that  the  size  of  a  target  may  be  smaller  than 
the  pixel  size.  In  this  case,  the  target  is  embedded  in 
a  single  pixel  and  cannot  be  identified  by  its  spatial 
presence.  Unfortunately,  many  such  targets  exist, 
such  as  special  species  in  agriculture  and  ecology, 
toxic/metal  waste  in  environmental  monitoring, 
rare  minerals  in  geology,  drug  trafficking  in  law 
enforcement,  and  small  combat  vehicles  in  battlefield 
to  name  just  a  few. 

Two  types  of  targets  are  generally  of  interest 
for  ATR  and  have  been  studied  extensively  in  the 
literature.  One  is  anomaly  with  signature  spectrally 
distinct  from  its  surroundings.  Another  is  endmember 
that  is  defined  as  an  idealized  and  pure  signature 
for  a  class  [1].  Depending  upon  which  type  of 
targets  is  of  interest,  different  approaches  have 
been  investigated  and  developed.  As  for  anomaly 
detection,  a  popular  and  well-known  algorithm, 
generally  referred  to  as  RX  algorithm  was  suggested 
by  Reed  and  Yu  [2]  who  formulated  a  binary 
composite  hypothesis  testing  problem  to  derive  a 
constant  false  alarm  probability  (CFAR)  detector 
which  turned  out  to  be  the  Mahalanobis  distance  [3]. 
Others  include  Chang  et  al.’s  RX  algorithm-based 
anomaly  detection  and  classification  in  [4],  Ashton's 
adaptive  Bayesian  classifier  in  [5],  Schweizer  and 
Moura’s  Gauss-Markov  random  field  (GMRF)  in 
[6],  independent  component  analysis  (ICA)-based 
linear  spectral  random  mixture  analysis  [7]  and 
projection  pursuit  [8].  As  for  the  endmember 
extraction,  two  most  popular  and  widely  used 
algorithms  are  pixel  purity  index  (PPI)  available 
in  the  Research  Systems  ENVI  software  [9]  and 
N-finder  (N-FINDR)  developed  by  Winter  et  al. 

[10].  Additionally,  many  other  approaches  have 
been  also  developed,  which  include  convex  cone 
analysis  (CCA)  [11],  unsupervised  fully  constrained 
least  squares  (UFCLS)  [12],  iterative  error  algorithm 
(IEA)  [13],  automated  morphological  endmember 
extraction  (AMEE)  [14],  and  projection  pursuit  [15]. 
A  recently  developed  approach,  called  automatic 
target  detection  and  classification  algorithm  (ATDCA) 
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[16-17],  which  made  use  of  a  sequence  of  orthogonal 
subspace  projections  (OSPs)  [18]  has  been  shown  to 
be  effective  in  automatic  target  extraction. 

As  mentioned  above,  since  the  targets  of  interest 
generally  occupy  a  few  pixels,  they  may  not  be 
able  to  constitute  reliable  second-order  statistics.  In 
this  case,  their  presence  may  be  more  effectively 
captured  through  high-order  statistics.  In  order  to 
take  advantage  of  high-order  statistics,  projection 
pursuit  and  ICA  were  introduced  as  alternatives 
for  anomaly  detection  [7-8,  17]  and  endmember 
extraction  [15,  19].  The  idea  was  based  on  an 
assumption  that  if  the  image  background  can  be 
characterized  by  second-order  statistics,  anomalies 
can  be  then  viewed  as  outliers  as  opposed  to  the 
image  background  due  to  the  fact  that  their  sizes 
are  relatively  small  and  spectral  features  are  very 
different  compared  with  their  surroundings.  As  a 
result,  anomaly  detection  can  be  performed  more 
effectively  by  searching  for  deviations  from  the 
background  distribution.  Similarly,  the  occurrence 
of  pure  signatures  (endmembers)  in  a  real  image 
scene  is  usually  very  low  and  rare.  Therefore,  their 
existence  can  be  more  effectively  characterized  by 
high-order  statistics.  In  doing  so,  the  simplest  higher 
order  statistics  are  skewness  and  kurtosis  which  are 
the  normalized  third  and  fourth  central  moments  and 
can  be  used  to  measure  the  asymmetry  and  flatness 
of  distribution  respectively.  If  the  image  background 
is  assumed  to  be  Gaussian  distributed,  the  skewness 
and  kurtosis  can  be  used  to  measure  the  difference  of 
a  distribution  from  Gaussianity.  Using  this  property 
as  a  criterion,  skewness  and  kurtosis  seem  to  be 
appropriate  measures  in  detection  of  anomalies  or 
small  targets.  This  paper  investigates  ATR  using 
high-order  statistics  and  also  explores  approaches 
to  finding  optimal  projection  directions  so  that  the 
projected  data  have  the  maximal  high-order  statistics. 
Since  high-order  statistics-based  methods  generally 
require  calculation  of  a  sequence  of  projections 
which  can  be  very  computationally  expensive  and 
cumbersome,  this  paper  further  develops  efficient 
algorithms  for  finding  these  projections  with  greatly 
reduced  computational  complexity. 

Another  challenging  issue  for  ATR  is  to  determine 
the  number  of  targets  needed  to  be  generated  without 
prior  knowledge.  This  is  equivalent  to  determining 
how  many  projections  are  required  to  be  generated  for 
projection  pursuit-based  approaches  [7-8],  how  many 
endmembers  assumed  to  be  present  in  the  data  for 
endmember  extraction  [9-15],  or  how  many  targets 
are  extracted  for  target  detection  and  classification 
[16-17].  A  general  approach  is  either  to  use  trial 
and  error  or  to  compute  the  accumulated  eigenvalues 
to  account  for  a  certain  percentage  of  total  energy. 
Unfortunately,  neither  method  is  effective  as  shown 
in  [17]  and  [20].  In  order  to  address  this  issue,  a 


recently  developed  concept  of  virtual  dimensionality 
(VD)  is  used  to  determine  the  number  of  projections 
required  for  ATR.  The  idea  of  the  VD  has  its  root  in 
Neyman-Pearson  detection  theory  [21]  which  uses 
the  false  alarm  probability  as  a  criterion  to  estimate 
the  number  of  spectrally  distinct  signal  sources 
present  in  the  data.  Using  the  VD  as  an  estimate,  the 
number  of  projections  can  be  reasonably  determined. 
Experiments  conducted  in  the  work  presented  here 
also  support  the  use  of  the  VD  in  determination  of 
number  of  projections  for  ATR. 

In  order  to  demonstrate  advantages  of  our 
proposed  high-order  statistics-based  methods  over 
second-order  statistics-based  methods,  the  RX 
algorithm  [2]  and  the  ATDCA  [16-17]  are  selected  for 
comparative  analysis  in  two  applications,  endmember 
extraction  and  target  detection  and  classification  where 
two  real  hyperspectral  image  data  sets  are  used  for 
experiments.  The  reason  for  such  selection  is  because 
these  two  techniques  are  second-order  statistics-based 
methods  and  each  of  them  uses  a  different  type  of 
criterion.  For  example,  the  RX  algorithm  is  derived 
from  a  Gaussian  kernel  or  Mahalanobis  distance  [2] 
and  the  ATDCA  is  developed  on  the  signal-to-noise 
ratio  (SNR)-based  OSP  [16-17], 

The  paper  is  organized  as  follows.  Section  II 
presents  iterative  methods  to  find  skewness  and 
kurtosis-based  projections  for  ATR  with  extension 
to  any  high-order  statistics.  Section  III  conducts  real 
hyperspectral  image  experiments  to  demonstrate 
the  performance  of  the  higher-order  statistics-based 
ATR  algorithms.  Finally,  Section  IV  draws  some 
conclusions. 


II.  ATR  USING  HIGH-ORDER  STATISTICS 


Assume  that  there  are  N  data  points  {xjV, ,  each 
of  which  has  dimensionality  L  and  X  =  [  x  |  x2  •  ■  ■  xv  ] 
is  an  L  x  N  data  matrix  formed  by  {xjV, .  Let  w 
be  an  L-dimensional  column  vector  and  assumed 
to  be  a  desired  projection  vector.  Then  z  =  wTX  = 
(zl,z2, •  ■  •  ,ZN)[  is  a  1  x  N  row  vector  which  represents 
the  projection  of  data  {x,}^  mapped  along  the 
direction  of  w,  where  T  denotes  the  transpose  of  a 
vector  or  matrix.  Now,  assume  that  F(  )  is  a  function 
to  be  explored  and  defined  on  the  projection  space 
z  =  wTX.  The  selection  of  the  function  F  depends 
upon  various  applications.  For  example,  in  order  to 
detect  small  targets  in  a  large  unknown  background, 
skewness  and  kurtosis  are  generally  used  as  criteria 
to  measure  asymmetry  and  flatness  of  a  distribution, 
respectively.  In  this  case,  F(.)  can  be  defined  by 
skewness  (k3)  with 


F(Zi)  =  k3(z,)  = 


g[(Z;-M)3] 


£[(wTx,.-p)3] 


for  each  i=l,2,...,N  (1) 
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which  is  the  normalized  third  central  moment,  kurtosis 
(k4)  with 

for  each  i=l,2,...,N  (2) 


Assume  that  { A; }  ,  are  the  eigenvalues  of  the 

sample  covariance  matrix  X  =  (1  //V)XXT  formed  by 
X  and  {v,}f=1  are  their  corresponding  eigenvectors. 

The  covariance  matrix  can  be  decomposed  into 

VTXV  =  A  (4) 


which  is  the  normalized  fourth  central  moment  and 
any  kth  order  statistic  n5  with 


F(Zi)  =  «*(Z<)  = 


EKZj-yif] 

rrk 


£[(wTx;-  -  /if] 


for  each  i  =  l,2,...,N.  (3) 

The  n  and  a  in  (l)-(3)  are  the  mean  and  standard 
deviation  of  random  variable  z(  ,  respectively.  Since 
small  targets  can  be  characterized  by  those  pixels  that 
cause  maximum  magnitude  of  asymmetry  and  ripples 
of  a  distribution,  finding  a  projection  vector  w  that 
maximizes  (1) — (3)  is  equivalent  to  finding  a  direction 
which  these  pixels  are  most  likely  aligned  with.  By 
projection  of  all  data  samples  {x(}A  ,  on  the  projection 
vector  w,  the  desired  small  targets  can  be  detected  by 
those  pixels  that  yield  the  largest  projection  along  the 
direction  of  w. 

If  we  assume  that  most  of  image  background 
can  be  described  by  second-order  statistics  and  the 
statistical  behaviors  of  targets  of  interest  go  beyond 
second-order  statistics,  a  logical  preprocessing 
for  detecting  such  targets  will  remove  the  image 
background  prior  to  target  detection.  In  doing  so,  we 
first  remove  the  sample  mean  and  de-correlate  the  data 
matrix  X  by  the  sphering  method  described  as  follows. 


where  V  =  LviV2  -  •  vL]  is  a  matrix  made  up  of  the 
eigenvectors  {v,}^)  and  A  =  diag{A,}f=1  is  a  diagonal 
matrix  with  L  eigenvalues  {A;}f=1  as  the  diagonal 

elements.  Let  A-1/2  =  diag{l/^/A^}^=|.  Multiplying 
both  sides  of  (4)  by  A-1/2  results  in 

A^1/2Vt5]YA~1/2  =  I.  (5) 


From  (5),  we  obtain  the  desired  sphering  matrix  A, 
given  by 

A  =  VA~1'/2  (6) 

so  that  ATXA  =  I.  The  data  set  resulting  from 
applying  the  sphering  matrix  A  to  the  original  data 
set,  {x,.}f=1  is  denoted  by  {y(}A  ,  and  the  process  of 
using  (4) — (6)  is  called  sphering  which  is  also  known 
as  a  whitening  process  of  X.  In  this  case,  the  data 
matrix  Y  has  zero  mean  and  an  identity  matrix  as  its 
covariance  matrix. 

If  we  replace  the  original  data  samples  {x(]A  1 
with  sphered  data  samples  {y,-}A  j  in  (3)  then  z  = 
wTY  -  wT[y1y2 •  •  -y*]  =  (wTy, , wTy2, . . .,wTyN)T  = 

(z, ,z2, . . . ,zN)v ■  Equation  (3)  can  be  reduced  to 

nk(Zi)  =  £[(wTy,y:]  for  each  i  =  1,2,..., A 

(7) 

with  subscript  k  indicating  k-order  statistic  for  k>  3. 


A.  Sphering 

The  idea  of  sphering  is  to  centralize  the  mean 
of  the  data  samples  {x(-}A  j  at  the  origin  while 
normalizing  the  data  variances  to  one.  In  this  case, 
two  sets  of  data  samples  can  be  categorized.  One  set 
is  made  up  of  all  data  samples  lying  on  the  surface  of 
a  sphere  centered  at  the  origin  with  unit  radius.  The 
set  of  these  data  samples  represents  uninteresting  data 
samples  which  may  include  most  image  background 
pixels.  The  second  set  of  data  samples  contains  all 
data  samples  which  are  not  on  the  sphere,  i.e.,  either 
inside  or  outside  the  sphere.  Only  these  data  samples 
are  of  major  interest  and  can  be  further  explored  by 
orders  of  statistics  higher  than  variance.  So,  working 
only  on  this  set  of  data  samples  may  exclude  most  of 
image  background  samples. 

In  order  to  perform  sphering,  we  first  remove 
the  sample  mean  of  data  set  by  X  =  X  —  /x  •  1T  = 

[x,  -/x,x2-/x,...,x;v-/x],  where  /x  =  ( 1  /N)  i  x;  is 
the  sample  mean  vector  and  1  =  [1 1  ■  ■  ■  1JT  is  column 

N 

vector  with  all  ones  in  the  components.  Next  step  we 
de-correlate  the  zero-mean  data  sample  matrix  X. 


B.  Algorithm  for  Finding  Projection  Images  for  ATR 


After  the  data  are  sphered,  the  next  task  is  to 
search  a  projection  vector  that  is  optimal  in  some 
sense.  If  the  skewness  is  used  as  a  criterion,  the 
projection  vector  should  be  the  one  that  points 
to  the  direction  where  the  projected  data  has  the 
most  asymmetric  histogram.  If  the  kurtosis  is  used 
as  a  criterion,  the  projected  data  will  yield  the 
most  heavy-tailed  histogram.  In  this  section,  an 
iterative  method  is  proposed  to  search  for  the  optimal 
projection  vector  based  on  these  criteria. 

To  find  the  projector  which  yields  the  maximum 
skewness,  we  impose  a  constrained  problem  as 
follows 


max 

W 


1 

N 


=  max 

W 


1  N 

i=  1 


T  T  T 

wyf-y/wwy- 


subject  to  wTw  =  1  (8) 


where  zt  is  the  projection  resulting  from  the  sphered 
data  sample  y(  via  the  projection  vector  w.  The 
constraint  wTw  =  1  is  used  for  normalization  such 
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that  the  skewness  of  the  resulting  data  after  projection 
will  not  be  affected  by  the  magnitude  of  w.  Using  the 
Lagrange  multiplier  method,  an  objective  function  is 
obtained  by 

7(w)  =  E[wl  yI.y'rwwl  y(]  -  A(wTw  -  1).  (9) 

Differentiating  (9)  with  respective  w  results  in 

=  SElytfJ  wyj  ]w  —  2Aw  =  0.  (10) 

Setting  A'  =  (2/3)A  yields 

(£[y,.yxwyx]-A'I)w  =  0.  (11) 

Solving  (11)  is  equivalent  to  finding  the  eigenvalue 
A'  of  the  matrix  £[y(yj  wy'r]  and  its  corresponding 
eigenvector  w*. 

When  using  kurtosis  as  the  searching  criterion,  the 
constrained  problem  becomes 

=  max|i^wTy,.y^wwTy,.y^J 

subject  to  wTw=l.  (12) 

In  analogy  with  (11)  we  can  also  obtain 

(£'[y,yxwwTy(yx]  -  A'I)w  =  0  (13) 

which  once  again  is  to  solve  the  eigenvalue  A' 
and  its  associated  eigenvector  w*  of  the  matrix 
£[y;yirwwTy,.y7].  The  obtained  w*  is  a  desired  the 
projector  that  yields  the  maximum  kurtosis. 

In  order  to  extend  the  above  treatment  to  any 
Ath  order  central  moment  we  solve  the  following 
eigen-problem 

(E[yi(yJw)k-2yJ-\  -  A'I)w  =  0  (14) 

which  is  the  eigenvector  of  E[y  [(yjw)k-2yj].  Using 
the  property  of  eigen-decomposition,  (14)  can  be 
reduced  to 

wT(E[y,(y;iw)t-2y;i])w  =  A'  (15) 

because  of  wTw  =  1.  Equation  (15)  can  be  further 
simplified  to 

E[wTyi{yJyv)k-2yJw]  =  E[{yjw)k]  =  E[zf]  =  A' 

(16) 

which  turns  out  to  be  the  Ath  central  moment  of 
z  =  (w*)TY. 

Since  a  single  projection  vector  w  that  solves 
(14)  can  only  detect  one  type  of  anomaly.  In  order 
to  detect  more  types  of  anomalies  present  in  an  image 
scene  a  sequence  of  projections  must  be  performed. 

In  doing  so,  when  a  projector  vector  w*  is  found,  the 
de-correlated  data  Y  is  then  mapped  into  the  linear 
subspace  (w*)x  orthogonal  to  (w*)  that  is  the  space 
linear  spanned  by  w*.  The  next  projection  vector  w* 
is  then  found  by  solving  (14)  in  the  space  (w*)x. 


The  same  procedure  is  continued  on  until  a  stop 
criterion  is  satisfied  such  as  the  predetermined  number 
of  projections  required  to  be  generated.  A  detailed 
implementation  of  finding  a  sequence  of  projection 
vectors  can  be  described  as  follows. 

Projection  Vector  Generation  Algorithm 

1)  Sphere  the  original  data  set  X.  The  resulting 
data  set  is  denoted  by  Y. 

2)  Find  the  first  projection  vector  wj  by  solving 
(13)  to  find  the  optimal  projection  vector  based  on 
maximizing  the  Ath  normalized  central  moment. 

3)  Using  the  found  w*,  generate  the  first 
projection  image  Z1  =  (wj)TY  =  {zj  z]  =  (wjj)Ty;} 
which  can  be  used  to  detect  the  first  type  of  anomaly. 

4)  Apply  the  OSP  specified  by  Px  =  I  — 

w [  (wx W[ )—  1  wx  to  the  data  set  Y  to  produce  the  first 
OSP-projected  data  set  denoted  by  Y1,  Y1  =  Pw  Y. 

5)  Use  the  data  set  Y1  and  find  the  second 
projection  vector  wj  by  solving  (14). 

6)  Apply  Px2  =  1  -w2(w]w2)  1  w2  to  the 
data  set  Y1  to  produce  the  second  OSP-projected 
data  set  denoted  by  Y2,  Y2  =  PWj  Y 1  which  can 
be  used  to  produce  the  third  projection  vector  wj 
by  solving  (14).  Or  equivalently,  define  a  matrix 
projection  matrix  W2  =  [w,  w2]  and  apply  Pw  = 

I  —  W2((W2)TW2)  '(W2)1  to  the  original  sphered  data 
set  Y  to  obtain  Y2  =  PW  Y. 

7)  Repeat  the  procedure  of  steps  5  and  6  to 
produce  wj,...,wjjj  until  a  predetermined  number  of 
projection  vectors  is  reached. 

It  should  be  noted  that  the  implementation  of 
step  2  in  the  above  algorithm  is  not  trivial.  In  order 
to  solve  (14)  for  the  optimal  projection  vector  wj  the 
following  iterative  procedure  is  proposed  to  execute 
the  step  2. 

a)  Initialize  a  random  projector  wj01  and  set  n  =  0 

b)  Calculate  the  matrix  £’[y,(yxw<1"))A_2yx] 
and  find  an  eigenvector  \\n>  corresponding  to  the 
largest  magnitude  of  eigenvalues  of  the  matrix 

£[y,-(y/wi',T"2y/’]- 

c)  If  the  Euclidean  distance  ||  w<|")  —  w(”+1)||  >  e 
or  || w(j"2  +  w("+1)||  >  e,  then  let  W"+l)  =  and  n  <— 
n  +  1,  go  to  step  (b).  Otherwise,  w(j' 1  is  the  desired 
projector  wj.  Let  wj  =  w"  and  return  to  step  3  in  the 
projection  vector  generation  algorithm. 

III.  EXPERIMENTS 

Two  sets  of  real  hyperspctral  image  data, 
airborne  visible  infrared  imaging  spectrometer 
(AVIRIS)  Cuprite  data  and  hyperspectral  digital 
image  collection  experiment  (HYDICE)  data  were 
used  for  experiments.  The  Cuprite  data  was  used  to 
demonstrate  advantage  of  high-order  statistics-based 
methods  over  second-order  statistics-based  methods 
in  endmember  extraction,  while  the  HYDICE 
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Fig. 


(a)  (b) 


(a)  Spectral  band  number  50  (827  nm)  of  Cuprite  AVIRIS  image  scene,  (b)  Spatial  positions  of  five  pure  pixels  corresponding 
to  minerals:  alunite  (A),  buddingtonite  (B),  calcite  (C),  kaolinite  (K),  muscovite  (M). 


data  was  used  to  show  superior  performance  of 
high-order  statistics-based  methods  to  second-order 
statistics-based  methods  in  target  detection  and 
classification. 

A.  Endmember  Extraction 

Endmember  extraction  is  one  of  fundamental 
tasks  in  hyperspectral  image  analysis.  It  finds  and 
identifies  the  purest  signatures  in  image  data  for 
various  applications  such  as  image  endmembers  used 
for  linear  spectral  unmixing,  training  samples  for 
unsupervised  image  classification,  etc.  The  proposed 
high-order  statistics-based  endmember  extraction  was 
evaluated  for  performance  analysis  where  the  widely 
used  endmember  extraction  algorithm  N-FINDR  was 
used  as  benchmark  comparison.  The  reason  to  choose 
N-FINDR  over  the  PPI  was  due  to  the  fact  that  the 
N-FINDR  is  available  in  the  literature  compared  with 
the  PPI  which  is  only  available  in  the  ENVI  software. 

The  first  image  data  was  collected  over  the 
Cuprite  mining  site,  Nevada,  in  1997,  and  is  shown 
in  Fig.  1(a).  It  is  a  224  band  AVIRIS  image  scene 
with  a  size  of  350  x  350  pixels,  is  well  understood 
mineralogically,  and  has  reliable  ground  truth  available 
at  website  [23]  where  the  five  pure  pixels  representing 
the  five  minerals,  alunite  (A),  buddingtonite  (B), 
calcite  (C),  kaolinite  (K),  and  muscovite  (M),  referred 
to  as  endmembers  are  white-circled  and  labeled  by  A, 

B,  C,  K,  and  M  in  Fig.  1(b).  This  fact  has  made  this 
scene  a  standard  test  site  for  endmember  extraction. 

It  should  be  noted  that  in  the  Cuprite  image  data, 
bands  1-3,  105-115,  and  150-170  have  been  removed 
prior  to  the  analysis  due  to  water  absorption  and  low 
SNR  in  those  bands.  As  a  result,  a  total  of  189  bands 
were  used  for  experiments. 

The  VD  estimated  for  this  image  scene  with 
different  values  of  false  alarm  probability  PF  is  given 


TABLE  I 

VD  Estimated  by  HFC  Method  with  Various  False  Alarm 
Probabilities 


II 

o 

1 

II 

o 

1 

to 

PF  =  10-3 

T4" 

1 

o 

II 

A5 

PF  =  10"5 

VD 

34 

30 

24 

22 

20 

TABLE  II 

Four  Endmembers  Extracted  by  N-FINDR  with  MNF-DR 

A 

B 

C 

K 

M 

A' 

0.0235 

0.1665 

0.2143 

0.1012 

0.1542 

C' 

0.2235 

0.1002 

0.0511 

0.2276 

0.1222 

K' 

0.0812 

0.1434 

0.1771 

0.0418 

0.1010 

M' 

0.1675 

0.0933 

0.0971 

0.1483 

0.0381 

in  Table  I  where  the  Harsanyi-Farrand-Chang  (HFC) 
method  in  [17,  20]  was  used  for  estimation. 

In  order  to  ensure  that  all  mineral  signatures 
of  interest  were  included  in  high-order 
statistics-generated  components,  the  false  alram 
rate  /p  chosen  for  the  VD  was  set  to  1 0  4  which 
resulted  in  VD  =  22.  However,  it  should  be  noted 
that  this  was  an  emiprical  choice.  Since  each 
component  image  represents  a  specific  class  of 
targets  it  can  be  used  to  serve  as  abundance  map 
for  this  particular  class.  Endmember  signature  is 
identified  as  the  purest  signature  in  the  scene,  in  this 
sense,  the  brightest  pixel  in  the  image  component 
(IC)  can  be  extracted  as  an  endmember.  Using  the 
spectral  angle  mapper  (SAM)  in  [1]  as  a  spectral 
similarity  measure,  the  pixels  whose  signatures  were 
identified  to  be  closet  to  the  ground  truth  mineral 
endmembers  were  extracted  as  candidate  endmembers. 
Table  II  tabulated  their  SAM  values  between  the  four 
N-FINDR-extracted  signatures,  denoted  by  A',  C', 

K',  M'  and  the  five  ground  truth  endmembers,  A,  B, 

C,  K,  M,  with  no  signature  found  to  correspond  to 
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TABLE  Ilia 

Five  Signatures  Extracted  by  Skewness 


TABLE  IIIc 

Five  Signatures  Extracted  by  5th  Moment 


A 

B 

C 

K 

M 

Order  of  IC 

A' 

0.0172 

0.1645 

0.2115 

0.0961 

0.1476 

15 

B' 

0.1591 

0.0745 

0.0793 

0.1767 

0.0969 

13 

C' 

0.2192 

0.1069 

0.0362 

0.2196 

0.1174 

11 

K' 

0.1029 

0.1619 

0.2025 

0.0300 

0.1183 

2 

M' 

0.1650 

0.0844 

0.1162 

0.1508 

0.0808 

17 

TABLE  Illb 

Five  Signatures  Extracted  by  Kurtosis 


A 

B 

C 

K 

M 

Order  of  IC 

A' 

0.0172 

0.1645 

0.2115 

0.0962 

0.1476 

10 

B' 

0.1591 

0.0749 

0.0793 

0.1768 

0.0969 

11 

e 

0.2192 

0.1069 

0.0362 

0.2196 

0.1174 

18 

K' 

0.0888 

0.1834 

0.2283 

0.0341 

0.1453 

7 

M' 

0.1458 

0.0781 

0.1141 

0.1347 

0.0706 

21 

A 

B 

C 

K 

M 

Order  of  IC 

A' 

0 

0.1576 

0.2038 

0.0961 

0.1421 

15 

B' 

0.1412 

0.0589 

0.0847 

0.1521 

0.0777 

21 

C' 

0.2192 

0.1069 

0.0362 

0.2196 

0.1174 

19 

K' 

0.0888 

0.1835 

0.2284 

0.0341 

0.1453 

7 

M' 

0.1067 

0.1293 

0.1558 

0.0688 

0.0711 

14 

TABLE  Hid 

Five  Signatures  Extracted  by  FastICA 


A 

B 

C 

K 

M 

Order  of  IC 

A' 

0 

0.1576 

0.2038 

0.0961 

0.1421 

17 

B' 

0.1385 

0.0623 

0.079 

0.1584 

0.0826 

13 

C' 

0.2192 

0.1069 

0.0362 

0.2196 

0.1174 

21 

K' 

0.1029 

0.1619 

0.2026 

0.03 

0.1183 

6 

M' 

0.1650 

0.0844 

0.1162 

0.1508 

0.0809 

18 

the  mineral  endmember  C.  It  should  be  noted  that 
when  the  N-FINDR  was  implemented,  it  required 
dimensionality  reduction  which  was  performed  by  the 
maximum  noise  fraction  (MNF)  [24]  and  the  number 
of  dimensions  to  be  retained  was  set  to  p,  in  this  case, 

p  =  22. 

In  order  to  implement  the  proposed  high-order 
statistics-based  ATR  for  endmember  extraction,  the 
three  criteria  of  skewness,  kurtosis,  5th  moment  as 
well  as  the  FastICA  were  evaluated  for  performance 
analysis. 

There  are  three  different  ways  to  generate  initial 
projection  vectors  to  be  used  for  the  high-order 
statistics-based  algorithms  and  FastICA: 

1)  random-generated  initial  projection  vector:  a 
unit  vector  with  components  randomly  generated, 

2)  unity-based  initial  projection  vector:  a  unit 
vector  with  all  ones  in  its  components, 

3)  eigenvector-based  initial  projection  vectors: 
the  eigenvectors  corresponding  to  the  p  largest 
eigenvalues  of  the  data  sample  covariance  matrix. 

Nevertheless,  according  to  our  experiments, 
the  results  derived  from  these  three  different  initial 
projection  vectors  were  very  similar  even  though 
their  produced  component  images  might  appear  in 
different  orders.  Therefore,  only  the  SAM  values 
between  the  ground  truth  endmembers  and  signatures 
extracted  by  algorithms  using  eigenvectors  as  initial 
projection  vectors  are  tabulated  in  Tables  Illa-IIId 
where  the  ICs  in  the  last  columns  indicate  the  orders 
of  the  ICs  generated  by  algorithms.  Figs.  2(a)-(d) 
also  show  the  gray  scale  component  images  produced 
by  the  three  algorithms  and  FastICA  corresponding 
to  the  components  identified  in  the  last  columns  of 
Tables  Illa-IIId. 

Four  criteria  of  high-order  statistics  were  used 
to  produce  image  components  for  ATR,  which 


are  to  maximize  skewness,  kurtosis,  5th  moment, 
and  statistical  independency.  The  results  are 
summarized  in  Table  IV  with  the  five  minerals 

A,  B,  C,  K,  M  listed  in  the  1st  row.  Since  the  ICs 
were  generated  sequentially,  the  orders  of  the  ICs 
generated  in  different  scenarios  are  usually  different. 
In  Table  IV  the  numbers  under  each  of  the  minerals 
indicate  that  the  orders  of  the  components  that 
specified  the  particular  minerals  were  generated. 

In  addition,  the  computing  time  in  seconds  for 
each  criterion  is  provided  in  the  last  column  in 
Table  IV. 

According  to  Table  IV,  all  the  minerals  were 
extracted  within  30  components  except  two  cases 
that  were  the  skewness  using  the  unity  vector  as 
the  initial  projection  vector  and  the  FastICA  using 
a  random  vector  as  initial  projection  vector,  each  of 
which  missed  one  type  of  mineral.  As  for  CPU  time, 
the  FastICA  required  the  least  time.  However,  this 
may  not  be  conclusive  since  our  algorithms  to  run 
high-order  statistics  were  not  optimized  in  terms  of 
source  codes.  Comparing  the  results  in  Tables  III-IV 
produced  by  the  high-order  ATR  to  the  results  in 
Table  II  produced  by  the  N-FINDR,  it  is  very  clear 
that  the  high-order  statistics-based  methods  usually 
performed  better  than  the  N-FINDR  in  endmember 
extraction  where  the  former  extracted  the  pixels 
corresponding  to  all  the  five  mineral  endmembers 
compared  with  only  four  endmembers  extracted  by 
the  N-FINDR. 

B.  Target  Detection 

The  second  mage  data  set  is  a  HYDICE 
image  scene  shown  in  Fig.  3(a)  that  was  used  for 
experiments  of  target  detection.  It  has  a  size  of 
64  x  64  pixel  vectors  with  15  panels  in  the  scene  and 
the  ground  truth  map  in  Fig.  3(b)  [17]. 
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Fig.  2.  (a)  Image  components  produced  by  skewness,  (b)  Image  components  produced  by  kurtosis.  (c)  Image  components  produced  by 

5th  moment,  (d)  Image  components  produced  by  FastICA. 


It  was  acquired  by  210  spectral  bands  with  a 
spectral  coverage  from  0.4  pm  to  2.5  pm.  Low 
signal/high  noise  bands  (bands  1-3  and  bands 
202-210)  and  water  vapor  absorption  bands  (bands 
101-112  and  bands  137-153)  were  removed.  So,  a 
total  of  169  bands  were  used.  The  spatial  resolution 
is  1.56  m  and  spectral  resolution  is  10  nm.  Within 
the  scene  in  Fig.  3(a)  there  is  a  large  grass  field 
background,  and  a  forest  on  the  left  edge.  Each 
element  in  this  matrix  is  a  square  panel  and  denoted 
by  Pjj  with  rows  indexed  by  i  and  columns  indexed  by 


j  =  1,2,3.  For  each  row  i  =  1,2,..., 5,  there  are  three 
panels  pn,  pi2,  pB,  painted  by  the  same  material  but 
with  three  different  sizes.  For  each  column  j  =  1,2,3, 
the  5  panels  pXj,  p2j,  p3; ,  p4; ,  p5/  have  the  same  size 
but  with  five  different  materials.  So,  panels  in  five 
different  rows  were  painted  by  five  different  materials. 
The  sizes  of  the  panels  in  the  first,  second,  and  third 
columns  are  3mx3m,  2mx2m  and  1  m  x  1  m 
respectively.  Since  the  size  of  the  panels  in  the  third 
column  is  1  m  x  1  m,  they  cannot  be  seen  visually 
from  Fig.  3(a)  due  to  the  fact  that  its  size  is  less 


1378 


IEEE  TRANSACTIONS  ON  AEROSPACE  AND  ELECTRONIC  SYSTEMS  VOL.  42,  NO.  4  OCTOBER  2006 


Eli,  Pi  2;  Pi  3 
P Zb  p22,  p23 
P3b  P32,  p33 

Pib  P42;  p43 
P,5b  P'2;  p53 


Fig.  3.  15-panel  HYDICE  image,  (a)  15-panel  image  scene,  (b)  Ground  truth  map  of  15  panels. 


TABLE  IV 

Order  of  Components  that  Specified  Minerals  were  Generated 


High-Order  Statistics 

Initial  Projection  Vectors 

A 

B 

C 

K 

M 

CPU  Time  (seconds) 

random  vectors 

13 

19 

12 

3 

18 

1548 

skewness 

unity  vector 

17 

Not  found 

11 

3 

21 

1686 

eigenvectors 

15 

13 

11 

2 

17 

1516 

random  vectors 

19 

15 

20 

4 

17 

4373 

Kurtosis 

unity  vector 

20 

4 

7 

3 

13 

5736 

eigenvectors 

10 

11 

18 

7 

21 

3928 

random  vectors 

14 

20 

15 

3 

13 

1853 

5  th  moment 

unity  vector 

15 

17 

21 

3 

14 

1677 

eigenvectors 

15 

21 

19 

7 

14 

1981 

random  vectors 

14 

13 

12 

7 

Not  found 

293 

FastICA 

unity  vector 

21 

14 

20 

4 

13 

340 

eigenvectors 

17 

13 

21 

6 

18 

306 

TABLE  V 

VD  Estimated  by  HFC  Method  with  Various  False  Alarm 
Probabilities 


II 

o 

1 

PF  =  10-2  pF  =  10-3  PF  =  10~4  PF  =  io-5 

VD  14 

11  9  9  7 

than  the  1 .56  m  pixel  resolution.  Fig.  3(b)  shows  the 
precise  spatial  locations  of  these  15  panels  where  red 
pixels  ( R  pixels)  are  the  panel  center  pixels  and  the 
pixels  in  yellow  (Y  pixels)  are  panel  pixels  mixed 
with  the  background.  The  1.56  m  spatial  resolution 
of  the  image  scene  suggests  that  most  of  the  15  panels 
are  one  pixel  in  size  except  p21,  Py\ ,  p4l,  p5l  which 
are  two-pixel  panels.  This  image  scene  provides  an 
excellent  example  for  ATR  since  it  contains  real 
subpixel  targets  and  mixed  pixels  that  cannot  be 
simulated  by  synthetic  images.  This  is  because  it  is 
difficult  to  simulate  a  synthetic  image  with  appropriate 
sample  spectral  correlation  to  reflect  real  data. 

The  VD  estimated  for  this  HYDICE  image  scene 
with  different  values  of  false  alarm  probability  PF  is 
given  in  Table  V. 

The  VD  was  chosen  to  be  9  with  the  false  alarm 
probability  set  to  PF  =  1 0  4  and  p  was  set  to  18  to 
ensure  that  all  the  15  panels  of  interest  included 


components  generated  by  high-order  statistics. 
Additionally,  the  same  three  ways  used  to  initial 
projection  vectors  for  Table  II  were  also  used  to 
initialize  algorithms  of  four  high-order  statistics, 
skewness,  kurtosis,  5th  moment,  and  statistical 
independency. 

The  results  are  summarized  in  Table  VI  with 
panels  in  five  rows  listed  in  the  1  st  row  where  the 
numbers  in  Table  VI  indicate  that  the  orders  of  the 
components  that  specified  the  panels  of  particular 
rows  were  generated.  In  analogy  with  Table  IV,  the 
least  CPU  time  also  came  from  the  FastICA. 

In  addition  to  Table  VI  the  ICs  produced  by 
the  skewness,  kurtosis,  5th  moment,  and  statistical 
independency  using  eigenvectors  as  initial  projection 
vectors  are  also  provided  in  Figs.  4(a)— (d)  to 
demonstrate  the  effectiveness  of  high-order  statistics 
for  ATR  in  target  detection  and  classification  where 
the  number  in  parenthesis  under  each  IC  was  the 
order  in  which  that  particular  image  component 
was  generated.  It  should  be  noted  that  only  the  ICs 
corresponding  to  the  five  panels  are  listed. 

These  figures  demonstrate  that  the  ICs  produced 
by  high-order  statitsics  could  be  used  for  target 
detection  and  classification  for  ATR.  On  the  other 
hand,  since  our  proposed  high-order  statistics-based 
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1st  panels  (2nd ) 


2nd  panels  (4th) 


4th  panels  (3rd) 


5lh  panels  (1st) 


3rd  panels(6th) 
(a) 


1st  panels  (3rd )  2nd  panels  (15th)  3rd  panels(  16th)  4lh  panels  (2nd)  5th  panels  (1st) 

(b) 


1st  panels  (3rd )  2nd  panels  (8lh)  3rd  panels(7,h)  4th  panels  (2nd)  5lh  panels  (1st) 

(c) 


1st  panels  (4th )  2nd  panels  (5th)  3rd  panels(9th)  4th  panels  (2nd)  5th  panels  (6th) 

(d) 

Fig.  4.  (a)  Image  components  produced  by  skewness  that  extracted  panels  in  five  rows  for  target  detection,  (b)  Image  components 

produced  by  kurtosis  that  extracted  panels  in  five  rows  for  target  detection,  (c)  Image  components  produced  by  5  th  moment  that 
extracted  panels  in  five  rows  for  target  detection,  (d)  Image  components  produced  by  FastICA  that  extracted  panels  in  five  rows  for 

target  detection. 

methods  are  completely  unsupervised  and  no  9  target  pixels  {tfTDCA}?=1  were  shown  in  Fig.  5(b) 

prior  knowldege  is  required,  the  second-order  with  numbers  indicating  the  orders  of  target  pixels 

statistics-based  RX  algorithm  for  anomlay  detection  being  generated  by  the  ATDCA.  According  to 

and  the  OSP-based  ATDCA  seem  to  serve  as  perfect  Fig.  5(b),  only  three  ATDCA-generated  target  pixels, 

candidates  for  comparison  due  to  the  fact  that  they  tATDCA,  tATDCA,  tATDCA  were  identified  to  represent 

both  are  completely  unsupervised  target  detectors  three  R  pixels  in  row  5,  3,  and  1,  respectively, 

and  require  no  prior  knolwedge.  Fig.  5(a)— (b)  shows  The  ATDCA  then  used  these  three  panel  pixels  to 

the  results  produced  by  the  RX  algorithm  and  the  perform  panel  classification.  The  results  are  shown 

ATDCA,  respectively,  where  the  ATDCA-generated  in  Fig.  6. 
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(a) 


(b) 


Fig.  5.  Detection  results  by  (a)  RX  algorithm,  and  (b)  ATDCA. 


,  •  ...ATDCA.  ,  .  ...  ATDCA.  .  .  -.  ATDCA  . 

panels  in  row  1(  t6  )  panels  in  row  3  ( t5  )  panels  m  row  5(  t3  ) 
Fig.  6.  Classification  results  by  ATDCA  using  these  panel  pixels. 


TABLE  VI 

Order  of  Components  that  Specified  Minerals  were  Generated 


High-Order 

Initial  Projection 

Panels  in 

Panels  in 

Panels  in 

Panels  in 

Panels  in 

CPU  Time 

Statistics 

Vectors 

Row  1 

Row  2 

Row  3 

Row  4 

Row  5 

(seconds) 

random  vectors 

1 

10 

2 

7 

3 

67.59 

skewness 

unity  vector 

3 

13 

1 

7 

4 

76.59 

eigenvectors 

2 

4 

6 

3 

1 

55.89 

random  vectors 

3 

7 

2 

8 

4 

110.03 

Kurtosis 

unity  vector 

5 

8 

4 

6 

2 

135.47 

eigenvectors 

3 

15 

16 

2 

1 

155.73 

random  vectors 

8 

2 

5 

4 

3 

103.53 

5th  moment 

unity  vector 

9 

not  found 

3 

4 

2 

103.35 

eigenvector 

3 

8 

7 

2 

1 

95.17 

random  vectors 

1 

6 

9 

3 

2 

6.96 

FastICA 

unity 

9 

7 

5 

4 

1 

9.07 

eigenvectors 

4 

5 

9 

2 

6 

6.22 

Since  the  panels  in  rows  2  and  3  were  made  by 
the  same  materials  with  different  paints,  the  ATDCA 
which  used  the  tATIXA  for  classification  also  classified 
panels  in  row  2  as  shown  in  Fig.  6.  Similarly,  it  was 
true  for  panels  in  rows  4  and  5.  If  we  compare  Fig.  6 
with  Fig.  5(a),  the  ATDCA  performed  significantly 
better  than  the  RX  algorithm.  However,  if  we  further 
compare  Fig.  6  with  Fig.  4(a)-(d),  it  is  obvious  that 
the  ATDCA  could  not  compete  against  the  high-order 
statistics-based  methods  because  the  latter  could  detect 


panels  in  the  five  rows  correctly  in  five  individual  and 
separate  components. 

In  order  to  make  further  comparison,  the  receiver 
operating  characteristics  (ROCs)  analysis  [21]  was 
used  to  evaluate  the  detection  performance  of  the 
four  high-order  statistics  based  methods,  skewness, 
kurtosis,  FastICA  against  the  RX  algorithm  and 
ATDCA  based  on  the  results  in  Figs.  4(a)-(d),  5(a), 
and  6.  Fig.  7(a)— (d)  plot  the  ROC  curves  of  each 
of  high-order  statistics  based  methods  relative  that 
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False  Alarm  Probability 


False  Alarm  Probability 


Fig.  7.  ROC  curves  of  high-order  ATR,  ICA  and  RX.  ATDCA. 


produced  by  the  RX  algorithm  and  ATDCA  where 
all  the  four  high-order  statistics-based  methods 
performed  significantly  better  than  the  RX  algorithm 
and  ATDCA. 

Four  comments  on  Fig.  7  are  noteworthy. 

1)  Since  the  detection  images  produced  in 

Figs.  4(a)-(d),  5(a),  and  6  were  gray  scale,  a  threshold 
must  be  applied  to  segment  panels  from  the  image 
background  for  panel  detection.  In  this  case,  the  ROC 
curves  in  Fig.  7  were  produced  by  varying  a  threshold 
from  the  lowest  gray  scale  to  highest  gray  scale  for 
panel  detection. 

2)  Second,  the  same  threshold  was  used  and 
applied  to  the  image  in  Fig.  5(a)  and  Fig.  6  and  all 
the  ICs  in  Fig.  4(a)-(d). 

3)  Third,  due  to  close  performance  among  all  the 
four  high-order  statistics-based  methods,  it  is  very 
difficult  to  discriminate  their  ROC  curves  one  from 
another.  So,  the  ROC  curves  were  plotted  by  each  of 
the  four  high-order  statistics  based  method  against  the 
RX  algorithm  and  ATDCA  for  clarity. 


4)  Finally,  it  should  be  noted  that  high-order 
statistics-based  methods  take  advantage  of  their 
generated  components  to  perform  target  detection 
and  classification,  while  the  RX  algorithm  cannot 
discriminate  the  targets  it  detects.  As  a  result,  the  RX 
algorithm  can  be  only  used  for  target  detection.  On 
the  other  hand,  the  ATDCA  can  be  used  for  target 
classification  as  it  did  in  Fig.  6.  But  its  second-order 
statistics  has  limited  its  ability  in  detecting  only  three 
panel  pixels,  not  five  panel  pixels  for  classification. 

IV.  CONCLUSION 

This  paper  investigates  high-order  statistics-based 
methods  for  ATR  in  hyperspectral  imagery  where 
third-order  normalized  central  moment  (skewness), 
fourth-order  normalized  central  moment  (kurtosis), 
fifth  normalized  central  moment,  and  infinite 
normalized  central  moment  (statistical  independency) 
are  studied  for  comparative  analysis.  Algorithms 
for  implementing  criteria  of  high-order  statistics  are 
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also  developed  and  considered  to  be  new  except 
the  FastICA  developed  in  [22]  which  is  used  to 
implement  the  criterion  of  statistical  independency. 

In  order  to  demonstrate  the  utility  of  our  proposed 
high-order  statistics-based  methods  using  skewness, 
kurtosis,  5th  moment,  and  statistical  independency 
as  criteria,  two  applications  in  endmember  extraction 
and  target  detection  and  classification  are  explored 
by  experiments.  The  experimental  results  show 
that  high-order  statistics-based  methods  have  clear 
advantages  over  the  second-order  statistics-based 
methods  such  as  PCA. 
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